Skip to content

feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436

Merged
samsja merged 2 commits into
mainfrom
feat/preserve-thinking-flags-main
May 17, 2026
Merged

feat(renderer): expose preserve_*_thinking flags in RendererConfig#2436
samsja merged 2 commits into
mainfrom
feat/preserve-thinking-flags-main

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 7, 2026

Summary

Re-applies #2433 onto main. The original PR landed against feat/unify-inference-generate and was merged there by mistake.

Adds two new fields to RendererConfig:

  • preserve_all_thinking: bool = False
  • preserve_thinking_between_tool_calls: bool = False

Both RL and SFT forward the flags to keep tokenization byte-consistent:

  1. Orchestrator create_renderer() — bound at construction on the training-side renderer used by build_trajectory_step / render_ids for trajectory tokenization.
  2. Orchestrator setup_inference_pool()setup_clients()vf.ClientConfig — the verifiers RendererClient picks them up and forwards to its create_renderer_pool, so every inference render carries the same thinking-preservation behaviour.
  3. SFT create_renderer() — same construction-time binding on the training renderer, so SFT tokenization matches the configured preservation policy.

Both flags are off by default → zero behaviour change for existing configs. Both the orchestrator and SFT validate_renderer_args validators reject either flag when use_renderer=False, matching the existing renderer knobs.

Per-flag semantics

  • preserve_all_thinking — emit reasoning_content for every past-assistant turn, even those before another user message.
  • preserve_thinking_between_tool_calls — emit reasoning_content only for the current tool cycle (post-last-user assistant→tool→…→assistant block). Older blocks fall back to template default. Strict subset of preserve_all_thinking.

Example config

[orchestrator.renderer]
name = "glm5"
preserve_all_thinking = true

Dependencies

No pyproject.toml / uv.lock changes. After the monorepo split in #2507, verifiers and renderers are uv workspace members vendored under deps/. Main's current submodule pins already include the upstream changes this PR depends on:

  • deps/verifiers @ 711a7c7 — 76 commits ahead of the verifiers PR (Run dir? #1298) that added preserve_*_thinking to ClientConfig.
  • deps/renderers @ 87084dc — 36 commits ahead of the renderers PR (Data streaming #4) that added construction-time preserve_*_thinking to create_renderer.

File-path delta vs #2433

The original PR targeted feat/unify-inference-generate where configs live under src/prime_rl/configs/. On main they live under packages/prime-rl-configs/src/prime_rl/configs/ (workspace-package split landed in #2416). The diff is otherwise identical.

🤖 Generated with Claude Code


Note

Medium Risk
Adds new renderer configuration knobs and threads them through orchestrator/inference client construction; while default-off, it touches rollout/inference plumbing and bumps verifiers plus adds a new renderers dependency, which could affect runtime compatibility.

Overview
Exposes two new RendererConfig booleans (preserve_all_thinking and preserve_thinking_between_tool_calls) and extends orchestrator validation to reject them unless orchestrator.use_renderer=true.

When the renderer client is enabled, these flags are forwarded both to create_renderer() and through setup_inference_pool()/setup_clients() into vf.ClientConfig (guarded so non-renderer clients don’t receive unknown extras), keeping training-side tokenization and inference rendering consistent.

Updates dependencies by adding a direct renderers git dependency and bumping the pinned verifiers revision; unit tests are adjusted to assert the new parameters are passed through.

Reviewed by Cursor Bugbot for commit cc381eb. Bugbot is set up for automated code reviews on this repo. Configure here.

@samsja samsja marked this pull request as ready for review May 16, 2026 06:30
Adds two new fields to RendererConfig:
  - preserve_all_thinking
  - preserve_thinking_between_tool_calls

The orchestrator forwards them in two places so train and infer stay
consistent:

  1. create_renderer() — bound at construction on the training-side
     renderer used by build_trajectory_step / render_ids.
  2. setup_inference_pool() → setup_clients() → vf.ClientConfig — the
     verifiers RendererClient picks them up and forwards to its
     create_renderer_pool, so every inference render carries the same
     thinking-preservation behaviour.

Both flags are off by default → zero behaviour change for existing
configs. Setting either without orchestrator.use_renderer=True is
rejected by validate_renderer_args, matching the existing renderer
knobs.

Bumps source pins:
  - verifiers 3b77145 → a7516a1 (PR #1298: ClientConfig propagation)
  - renderers (now a separate repo): pinned to fe67f9f (PR #4:
    construction-time preserve flags)

Re-applies #2433, which was merged into feat/unify-inference-generate
by mistake.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@hallerite hallerite force-pushed the feat/preserve-thinking-flags-main branch from cc381eb to 2544d0c Compare May 16, 2026 09:42
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2544d0c. Configure here.

Comment thread packages/prime-rl-configs/src/prime_rl/configs/shared.py
The flags were added to the shared RendererConfig but only wired up
in the orchestrator. SFT also constructs a renderer via create_renderer
for training-side tokenization, so it must forward both flags or
silently ignore them.

Also tighten validate_renderer_args to reject either flag when
use_renderer=False, matching the orchestrator validator.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@samsja samsja merged commit ae610d9 into main May 17, 2026
16 checks passed
@hallerite hallerite deleted the feat/preserve-thinking-flags-main branch May 17, 2026 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants